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METHOD AND APPARATUS FOR TESTING REQUEST-RESPONSE SERVICE 
USING LIVE CONNECTION TRAFFIC 

This application is based on Provisional Application Serial No. 60/189,734, filed March 
5 16, 2000. 

This application includes subject matter that is protected by Copyright Law. All rights 
reserved. 

BACKGROUND OF THE INVENTION 

10 Technical Field 

The present invention relates generally to testing a request-response service using live 
connection traffic. One such request-response service involves high-performance, fault-tolerant 
HTTP, streaming media and applications delivery over a content delivery network (CDN). 
jrf Description of the Related Art 

M45 It is well-known to deliver HTTP and streaming media using a content delivery network 

% (CDN). A CDN is a self-organizing network of geographically distributed content delivery 
£3 nodes that are arranged for efficient delivery of digital content (e.g., Web content, streaming 

media and applications) on behalf of third party content providers. A request from a requesting 
u end user for given content is directed to a "best" replica, where "best" usually means that the item 
t:20 is served to the client quickly compared to the time it would take to fetch it from the content 
Q provider origin server. An entity that provides a CDN is sometimes referred to as a content 
delivery network service provider or CDNSP. 

Typically, a CDN is implemented as a combination of a content delivery infrastructure, a 
request-routing mechanism, and a distribution infrastructure. The content delivery infrastructure 
25 usually comprises a set of "surrogate" origin servers that are located at strategic locations (e.g., 
Internet network access points, Internet Points of Presence, and the like) for delivering copies of 
content to requesting end users. The request-routing mechanism allocates servers in the content 
delivery infrastructure to requesting clients in a way that, for web content delivery, minimizes a 
given client's response time and, for streaming media delivery, provides for the highest quality. 
30 The distribution infrastructure consists of on-demand or push-based mechanisms that move 
content from the origin server to the surrogates. An effective CDN serves frequently-accessed 
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content from a surrogate that is optimal for a given requesting client. In a typical CDN, a single 
service provider operates the request-routers, the surrogates, and the content distributors. In 
addition, that service provider establishes business relationships with content publishers and acts 
on behalf of their origin server sites to provide a distributed delivery system. A well-known 
5 commercial CDN service that provides web content and media streaming is provided by Akamai 
Technologies, Inc. of Cambridge, Massachusetts. 

CDNSPs may use content modification to tag content provider content for delivery. 
Content modification enables a content provider to take direct control over request-routing 
without the need for specific switching devices or directory services between the requesting 
10 clients and the origin server. Typically, content objects are made up of a basic structure that 
includes references to additional, embedded content objects. Most web pages, for example, 
consist of an HTML document that contains plain text together with some embedded objects, 
such as .gif or .jpg images. The embedded objects are referenced using embedded HTML 
directives. A similar scheme is used for some types of streaming content which, for example, 
15 may be embedded within an SMIL document. Embedded HTML or SMIL directives tell the 
client to fetch embedded objects from the origin server. Using a CDN content modification 
scheme, a content provider can modify references to embedded objects so that the client is told to 
fetch an embedded object from the best surrogate (instead of from the origin server). 

In operation, when a client makes a request for an object that is being served from the 
20 CDN, an optimal or "best" edge-based content server is identified. The client browser then 

makes a request for the content from that server. When the requested object is not available from 
the identified server, the object may be retrieved from another CDN content server or, failing 
that, from the origin server. 

A well-managed content delivery network implements frequent upgrades to its production 
25 software, e.g., the software used to provide HTTP content delivery from its edge-based content 
servers. Thus, for example, as new content or "edge" server functionalities are added to the 
network, they need to be tested, debugged, rewritten and, ultimately, deployed into production 
across the network as a whole. An ongoing challenge is testing such new software is the 
inability to reproduce real-world workload on new versions of the software short of deploying 
30 them in the field. While testing a CDN server with real-world traffic (a "live load test") would 

-2- 



12293:32 



PATENT 



be desirable, it has not been possible to do so without having the CDN server interact with the 
outside world. This interaction may cause significant problems if the version under live test has 
bugs or otherwise interferes with conventional server functions. Additionally, when field- 
deployment is used, there is no convenient mechanism for checking if a new version of the 
5 software under test produces equivalent output to the old version, namely, the production 
version. 

Generally, there are a number of known approaches to testing software. Regression 
testing refers to the technique of constructing test cases and executing the software against those 
cases. Regression testing, while effective in avoiding repeat of bugs, is labor-intensive and thus 

10 costly. Stress or "load" testing refers to the technique of simulating the working environment of 
the software using a testbed or equivalent architecture. While stress/load testing is useful in 
evaluating system limits, finding representative workloads to use for the test is always difficult. 
Trace-based testing refers to the technique of playing back to the software under test a trace of 
activity obtained from a production version. This technique, although generally useful, may lead 

15 to inaccurate conclusions as, in some applications (like a CDN caching server), traces go stale 
very quickly and/or do not include information that might be needed to evaluate the new version 
effectively. Field-deployment testing, as its name suggests, refers to the technique of testing a 
version of the software with a real-world workload. As noted above, when field-deployment is 
used, there is no convenient way of isolating the software under test from interacting with real 

20 users and customers, and there is no mechanism for checking if a new version of the software 
under test produces equivalent output to the old version, namely, the production version. Error 
detection is hard, and debugging is difficult because there is limited information capture and the 
developer is often unable to deploy instrumented code. In addition, during live field-testing, the 
developer is not able to destructively test the code, i.e., to make the software less robust (e.g., 

25 letting it crash) in the face of problems instead of patching over them, in order to assist in 
tracking down problems. 

It would be desirable to be able to provide a way to test IP-networking-based servers 
(either software, hardware, or some combination thereof) with live traffic and to compare the 
results of these tests with currently running CDN traffic. Such a method also could be used to 

30 test network-based servers before their actual deployment. The present invention addresses this 
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BRIEF SUMMARY OF THE INVENTION 

The present invention provides for a method and apparatus for comparison of network 
systems using live traffic in real-time. The inventive technique presents real-world workload in 
real-time with no external impact (i.e. no impact on customers of the service, nor the system 
5 providing the service), and it enables comparison against a production system for correctness 
verification. 

A preferred embodiment of the invention is a testing tool for the pseudo-live testing of 
CDN content staging servers, although this is not a limitation of the invention. When deployed, 
production content staging servers (also referred to as reverse proxies or surrogate origin servers) 

10 sit behind a switch providing connectivity to the Internet. These switches often have a port- 
monitoring feature, used for management and monitoring, which allows all traffic going through 
the switch to be seen on the configured port. According to the invention, traffic between clients 
and the live production CDN servers is monitored by a simulator device, which replicates this 
workload onto a system under test (SUT). The simulator provides high-fidelity duplication 

15 (ideally down to the ethernet frame level), while also compensating for differences in the output 
between the system under test and the live production system. Additionally, the simulator detects 
divergences between the outputs from the SUT and live production servers, allowing detection of 
erroneous behavior. To the extent possible, the SUT is completely isolated from the outside 
world so that errors or crashes by this system do not affect either the CDN customers or the end 

20 users. Thus, the SUT does not interact with end users (i.e., their web browsers). Consequently, 
the simulator serves as a proxy for the clients. By basing its behavior off the packet stream sent 
between client and the live production system, the simulator can simulate most of the oddities of 
real-world client behavior, including malformed packets, timeouts, dropped traffic and reset 
connections, among others. 

25 In a preferred embodiment, the main functionality of the tool is provided by an External 

World Simulator (EWS). The EWS listens promiscuously on a CDN region switch interface, 
rewrites incoming client packets bound for a production server to be routed to a beta server being 
tested, optionally compares the content and headers of the beta reply to the production reply, and 
black-holes (i.e. terminates) the client bound traffic from the beta server. A primary advantage 

30 this tool provides is the ability to put servers of an unknown quality into a live environment and 
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to receive notification if the client experience differs from a known standard (as provided by the 
production servers). 

The simulator may provide varying degrees of validation. Thus, for example, the 
simulator may provide substantially limited validation that suffices for testing new versions for 

5 crashes and long-term memory leaks. The simulator may test for "identical" output, wherein the 
output of the system under test is checked for byte-for-byte equality with the production system. 
The simulator may also check for "equivalent" output, wherein the output of the SUT and the 
production system are checked for logical equivalence (isomorphism). This type of validation 
typically involves use of specific application-level logic. The particular equivalence checking 

10 logic will depend on the functionalities being implemented, of course. 

The foregoing has outlined some of the more pertinent features and technical advantages 
of the present invention. These features and advantages should be construed to be merely 
illustrative. Many other beneficial results can be attained by applying the disclosed invention in 
a different manner or by modifying the invention as will be described. Accordingly, other 

15 features and a fuller understanding of the invention may be had by referring to the following 
Detailed Description of the Preferred Embodiment. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a known content delivery network in which the present 
invention may be implemented; 

Figure 2 is a simplified block diagram of a known CDN content server; 
5 Figure 3 is a simplified block diagram of how a CDN region may be implemented in the 

prior art; 

Figure 4 is a block diagram of the inventive live-load testing system infrastructure of the 
present invention; 

Figure 5 is a block diagram illustrating a preferred architecture of the software modules 
10 that comprise the External World Simulator; 

Figures 6-7 are state diagrams illustrating how the EWS manages (opens and closes) 
O connections between the production ghost(s) and the invisible ghost(s) according to the preferred 
^ embodiment; and 

H : Figures 8-14 illustrate the operation of the EWS for a given connection between a 

y;il5 requesting client and a production server. 
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DESCRIPTION OF THE PREFERRED EMBODIMENT OF THE INVENTION 

Figure 1 is a diagram showing an illustrative content delivery network in which the 
present invention may be implemented. The content delivery service comprises a preferably 
global content delivery network (CDN) 100 of content delivery server regions 102a-n, a domain 
name service (DNS) system 104, and a content modification or "initiator" tool 106 that allows 
content to be tagged for inclusion on the network. DNS system 104 receives network mapping 
data from a map maker 107, which receives inputs from monitoring agents 109 distributed 
throughout the Internet. Agents typically perform various tests and monitor traffic conditions to 
identify Internet congestion problems. The map maker 107 takes the data generated from the 
agents and generates one or more maps detailing Internet traffic conditions. Generally, the 
content delivery service allows the network of content delivery server regions 102a-n to serve a 
large number of clients efficiently. Each region may include one or more content servers, with 
multiple content servers typically sharing a local area network (LAN) backbone. Although not 
meant to be limiting, a typical server is an Intel Pentium-based caching appliance running the 
Linux operating system with a large amount of RAM and disk storage. As also seen in Figure 1, 
the content delivery service may include a network operations control center (NOC) 112 for 
monitoring the network to ensure that key processes are running, systems have not exceeded 
capacity, and that subsets of content servers (the so-called CDN regions 102) are interacting 
properly. A content provider operates an origin server (or server farm) 1 15 from which 
requesting end users 1 19 would normally access the content provider's Web site via the Internet. 
Use of the CDN avoids transit over the Internet for selected content as described below. The 
content provider may also have access to a monitoring suite 1 14 that includes tools for both real- 
time and historic analysis of customer data. 

High-performance content delivery is provided by directing requests for web objects (e.g., 
graphics, images, streaming media, HTML and the like) to the content delivery service network. 
In one known technique, known as Akamai FreeFlow content delivery, HTTP and/or streaming 
media content is first tagged for delivery by the tool 106, which, for example, may be executed 
by a content provider at the content provider's web site 1 15. The initiator tool 106 converts 
URLs that refer to streaming content to modified resource locators, called ARLs for convenience, 
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so that requests for such media are served preferentially from the CDN instead of the origin 
server. When an Internet user visit's a CDN customer's site (e.g., origin server 1 15) and, for 
example, selects a link to view or hear streaming media, the user's system resolves the domain in 
the ARL to an IP address. In particular, because the content has been tagged for delivery by the 
5 CDN, the URL modification, transparent to the user, cues a dynamic Domain Name Service 
(dDNS) to query a CDN name server (or hierarchy of name servers) 104 to identify the 
appropriate media server from which to obtain the stream. The CDN typically implements a 
request-routing mechanism (e.g., under the control of maps generated from the monitoring agents 
109 and map maker 107) to identify an optimal server for each user at a given moment in time. 
10 Because each user is served from the optimal streaming server, preferably based on real-time 
Internet conditions, streaming media content is served reliably and with the least possible packet 
loss and, thus, the best possible quality. Further details of a preferred dDNS-based request- 
routing mechanism are described in U.S. Patent No. 6,108,703, which is incorporated herein by 
^ reference. 

Oi5 Figure 2 is a representative CDN content server 200. Typically, the content server 200 is 

m a Pentium-based caching appliance running an operating system kernel 202 (e.g., based on 
H " : Linux), a file system cache 204, CDN global host (or "ghost") software 206, TCP connection 
D manager 208, and disk storage 210. CDN ghost software 206 is useful to create a "hot" object 
U cache 212 for popular objects being served by the CDN. In operation, the content server 200 
%~p.O receives end user requests for content, determines whether the requested object is present in the 
H hot object cache or the disk storage, serves the requested object via HTTP (if it is present) or 
establishes a connection to another content server or an origin server to attempt to retrieve the 
requested object upon a cache miss. In a CDN such as described above with respect to Figure 1, 
a set of CDN content servers may be organized and managed together in a peer-to-peer manner as 
25 a CDN region. Figure 3 illustrates one such CDN region. In this example, which is merely 

representative, the CDN region comprises two (2) sets of four (4) production servers 300a-h that 
are interconnected over a common backnet 302, which may be a conventional ethernet 100BT 
switch as illustrated. One or more ethernet swithes 304a-b may be used as a front end to 
interconnect the CDN region to the public Internet 306, an intranet, a virtual private network, or 
30 the like. Although not meant to be limiting, the production servers may be architectured as 
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illustrated in Figure 2 and described above. 

A well-managed CDN has production servers that are frequently upgraded and enhanced 
with new software version. As a CDN grows in size, however, it becomes very difficult to test 
such new software and/or software versions given the scale of the network, the size of the 
5 codebase, the problems and deficiencies associated with laboratory or field-testing that have been 
discussed above. The present invention addresses this problem through a novel live-load systems 
testing infrastructure and methodology which are now illustrated and described. 

Figure 4 illustrates an implementation of the testing infrastructure 400 in the context of a 
CDN region, which is an exemplary application testing environment. In this example, the 
10 infrastructure comprises an External World Simulator 402 that sits between the production 

system and the system under test (SUT) 404. The EWS listens promiscuously on a CDN region 
switch interface, rewrites incoming client packets bound for a production server to be routed to a 
beta server being tested, optionally compares the content and headers of the beta reply to the 
production reply, and black-holes (i.e. terminates) the client bound traffic from the beta server. 
15 An advantage this tool provides is the ability to put servers of an unknown quality into a live 

environment and to receive notification if the client experience differs from a known standard (as 
provided by the production servers). In this example, the production system is illustrated by the 
CDN production region comprising four (4) production ghost servers 406a-d and the ethernet 
front-end switch 408. The backnet is omitted for clarity. The SUT comprises a set of four (4) 
20 so-called "invisible" ghost servers 410a-d and the front-end switch 412. A backnet may be used 
as well. Preferably, there is one invisible ghost server under test for every production ghost 
server, although this is not a requirement. As noted above, the External World Simulator 402 
monitors live traffic between the live production system and requesting clients (not shown) and 
replicates this workload onto the SUT 404. The EWS 402 provides high fidelity duplication 
25 (ideally down to the ethernet frame level), while compensating for differences in the output 
between the SUT and the live production system. Additionally, the EWS detects divergences 
between the outputs for corresponding pairs of SUT and live production servers (e.g., servers 
406a and 410a, 406b and 410b, etc.), thereby allowing detection of erroneous behavior. 

Although Figure 4 illustrates a SUT with multiple invisible ghosts, this is not a limitation. 
30 The number of machines under test is variable, and may include just a single invisible ghost 
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server, a full region of servers (such as illustrated), multiple regions, and the like. In addition, 
while preferably the infrastructure uses live system load for testing (i.e., CDN traffic is monitored 
and its traffic replicated in real-time to drive the SUT), a recorded trace may be captured by the 
EWS and replayed to the SUT at a later time for testing purposes. 

5 The term "invisible" is merely a shorthand reference to the fact that the SUT is 

completely isolated from the outside world so that errors or crashes by this system do not affect 
either the CDN's customers (content providers) or end users. In particular, the basic constraint 
that is enforced is that the SUT never interacts with end users (namely, their web browsers). 
Consequently, the EWS serves as a proxy for the clients. By basing its behavior off the packet 

10 stream sent between clients and the live production system, the External World Simulator can 
simulate most of the oddities of real-world client behavior including, without limitation, 
malformed packets, timeouts, dropped traffic and reset connections. Ideally, the SUT is able to 
emulate all outside entities (e.g., end user web browsers, customer web servers, DNS servers, 
network time services, and the like) to which the production ghost server talks in a conventional 

15 CDN operation. 

Although not meant to be limiting, the EWS preferably is a dual NIC, Intel/Linux-based 
machine running appropriate control routines for carrying out the above-described testing 
functionality. The production environment may be any commercial or proprietary Internet-, 
intranet- or enterprise-based content delivery network. An advantage this tool provides is the 

20 ability to put servers of an unknown quality into a live environment and to receive notification if 
the client experience differs from a known standard (as provided by the production servers). The 
tool may be augmented to allow one to route traffic from multiple production servers at a single 
test server - enabling a more realistic performance projection tool In addition, to handle greater 
throughout, HTTP comparison can be disabled. 

25 EWS enables monitoring of a production system to generate network-packet level 

accurate traffic. This provides an extremely high-fidelity workload for the test system. The 
external interaction may be at selectable test levels such as: HTTP request, IP packet, IP packet 
and timing, IP packet, timing and fragmentation. The EWS preferably handles various protocols, 
such as HTTP, HTTPS, and the like. The SUT response stream validation can be of varying 

30 degrees, such as limited, identical output and/or equivalent output. Thus, for example, the 
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simulator may provide substantially limited validation that suffices for testing new versions for 
crashes and long-term memory leaks. The simulator may test for "identical" output, wherein the 
output of the system under test is checked for byte-for-byte equality with the production system. 
The simulator may also check for "equivalent" output, wherein the output of the SUT and the 
production system are checked for logical equivalence (isomorphism). This type of validation 
typically involves use of specific application-level logic (e.g., checking dates in HTTP headers to 
determine if two different versions of an object being returned to a requesting client are valid 
comparing the output of persistent multi-GET connection versus several simple GET requests, 
etc.). The particular equivalence checking logic will depend on the functionalities being 
implemented, of course. As noted above, the scale of the system under test may be a single 
server (or given processes or programs running thereon), a full region of servers, multiple 
regions, and the like, and the testing environment may be used with live system load or with 
recorded client traces. 

Figure 5 illustrates one possible implementation of the External World Simulator. The 
EWS 500 comprises a set of software modules: a collector 502, a state machine 504, a logger 
506, an emitter 508, and a comparator 510. Preferably, the modules communicate via frame 
queues and operate in both time-slice and threaded modes of operations. The collector 502 is 
responsible for acquiring packets from the network, preferably using a sniffing library routine, 
and it also receives responses from the invisible ghosts (because it is the entry point for the 
EWS). In particular, and although not meant to be limiting, preferably the collector 502 takes 
advantage of the port-monitoring feature of existing ethernet switches in the CDN region. The 
port-monitoring feature, used for management and monitoring, allows all traffic going through 
the switch to be seen on the configured port. The collector 502 pulls traffic from the switch 
port-monitor (using the sniffing library), performs filtering for interesting packets (e.g., HTTP 
traffic on the production ghost server), and then feeds those packets into the state machine 504 
and the logger 506. The state machine 504 is the core logic of the EWS. It decides what packets 
should be sent and when. The state machine opens and closes connections between the 
participating entities, namely, the client, the production ghost server, and the invisible ghost 
server, as will be described in more detail below. The state machine also absorbs invisible ghost 
server responses to ensure that the SUT never interacts with the production servers. In particular, 
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these response packets follow the path through the collector (the input to the EWS), and the state 
machine recognizes them as client-bound traffic and absorbs them. 

As illustrated, the state machine 504 feeds packets into the emitter 508 and the 
comparator 510. The emitter 508 sends packets onto the network if needed, and isolates the state 
5 machine from the other functions. The comparator 510 assembles HTTP requests/responses 
from the TCP packets. It performs equivalence checking (depending on the application logic 
included) between the production ghost response and that of the invisible ghost. In one example, 
the checking verifies that HTTP response codes match. There may be some cases when the 
codes match but the content handed back (from the respective production ghost and the invisible 

10 ghost) differs, or the response code may not match when the content handed back is the same, 
and so on. The comparator may filter the data based on given criteria. Typically, the comparator 
writes given data to a log for later analysis. The comparator typically is HTTP-specific, and the 
other modules need not have any knowledge of what protocol is being used. 

As noted above, the various modules that comprise the EWS enable the EWS to 

15 masquerade (to the SUT) as clients. As connections are opened and closed, the EWS duplicates 
the TCP traffic flowing through the production system. It parses the ghost TCP streams into 
HTTP responses, checks for equivalence (or other application-level logic validation), records 
mismatches for human or automated analsyis, and facilitates performance analyis of the SUT or 
the components thereof. As noted above, the EWS (specifically, the state machine) absorbs or 

20 "black-holes" the SUT responses passed from the invisible ghosts through the collector to isolate 
the SUT from the real-world. 

Figures 6-7 illustrate state changes of the state machine in response to receiving packets 
from the various endpoints of the connections. Normal TCP connections only have two (2) 
endpoints, namely, the client and the production server. In the testing infrastructure, on the 

25 contrary, three (3) endpoints exist, namely, the client, the production system server and the 
invisible ghost server. Figure 6 is the opening state diagram, and Figure 7 is the closing state 
diagram. This separation is for clarity and omits some possible states. For instance, the 
production system may start closing the connection before the invisible system has finished 
establishing it. In addition, the effect of reset packets is ignored for convenience as those packets 

30 are not considered part of a normal traffic flow. Familliarity with basic TCP operation is 
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presumed. In the opening diagram (Figure 6), the states are denoted by three (3) binary digits, a 
"1" in the position indicates that a particular packet has been received, and a "0" represents that it 
has not been received. For the opening states, the leftmost bit represents the client's first ACK, 
the middle bit the production server SYNACK, and the rightmost bit the invisible server 
5 SYNACK. It is assumed that the client S YN has already been received or the state machine 
would not be entered. There are more control packets sent as part of connection tear-down, as 
illustrated in the closing diagram (Figure 7). The relevant packets examined are the invisible 
ghost fin (IJFTN), production server fin (P_HN), client fin (CJFIN), and client finack of the 
client fin (I_ACK(CJF). Some packets that are part of the tear-down process for normal TCP 
10 connections are not relevant to the state machine. Different line types denote which packet was 
received that triggered the state change, and optionally what packet was sent as a result (indicated 
by an S(), S(A) being an ACK, and S(F) being a FIN). Dashed lines are used for those state 
y;i changes that include sending out a packet. 

Y'l Figures 8-14 illustrate representative data generated by the testing infrastructure for a 

Ol5 given connection. Figure 8 illustrates the client-production server conversation for the 
03 connection. Figure 9 illustrates how the EWS duplicates the connection open and how the 
s ~* invisible ghost under test responds. Figure 10 illustrates how the EWS duplicates the client's 
t J first ack packet and the client request. Figure 1 1 illustrates the production and invisible ghost 
U responses. Figure 12 illustrates the client acknowledgement, the EWS acknowledgement and 
J* 20 FIN. Figure 13 illustrates the connection close, and Figure 14 illustrates a representative 
H comparator report. 

The present invention provides a number of new features and advantages. First, EWS 
enables monitoring of a production system to generate network-packet level accurate traffic that 
is then duplicated onto a SUT. This provides an extremely high-fidelity workload for the test 
25 system. Second, the output of the system is compared against the results of a running production 
system, which provides a very detailed check (if the new system is producing the desired results) 
without requiring the construction of a large number of test cases. Finally, the system under test 
is subjected to real world workload, but the system has no interactions with the outside. 
The following illustrates various routines and data structures that may be used to 
30 implement the EWS modules described above: 
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Collector: 



Structure Detail 



framejheaderj - framejheaderj is the header structure that occurs inside all frames, a basic unit 
of memory management. A frame is the basic unit of allocation in IG. time_stamp is set by the 
collector (or replay logic) when a packet is generated. ref_count is a bitmap, indicating which 
subsystems have interest in this frame. When becomes zero, the frame should be freed, 
typedef struct Jframe JieaderJ { 

u int32 frameno; 

struct timeval time_stamp; 

u int!6 refcount; 

u int!6 from_our_hw_addr: 1 ; 

u ,int!6 to_our_hw„addr: l ; 

u int!6 pad: 13; 

u int!6 packet_size; 

u int!6 frame_size; 

u int!6 ip_start; 

u Jntl6 tcp_start; 

u int!6 ip„tot_len; 

u int32 ip_csum; 

u int32 tcp_csum; 

u int32 * page_cnt; 

struct __frame__t * sm__qnext; 
} frame_header„t; 



Fields: 
frameno 
time^stamp 
refcount 

from_ourJiw_addr: 1 

to_our_hw_addr: 1 

pad: 13 

packet_size 

frame_size 

ip_start 

tcp_start 

ip_Jot_len 

ip_csum 

tcp^csum 

page_cnt 

sm_qnext 



unique id, used for debugging 

time of receipt of packets 

reference count, used to determine frame liveness 

indicates whether frame originated locally 

indicates whether frame originated elsewhere 

bits reserved for future use 

size of payload in this frame 

size of this frame, not including the header 

byte offset of ip data in the data area 

byte offset of tcp header in the data area 

length of ip payload, in bytes 

calc'ed during copy from collector 

calc'ed during copy from collector 

pointer to counter used for batch frame allocation 

linking pointer used by state machine. 



frame_t - frames hold packet data inside IG. The actual length of the data array is stored in 
hdr.frame_size; Data contains the IP packet/fragment. 
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typedef struct _frame_t { 

frame header t hdr; 

byte data[41; 
} framej:; 

Fields: 

hdr frame header 

data[4] byte array holding packet data. 



frame_ptr_array__t - the frame_ptr_array_t is a structure holding a fixed number of pointers to 
frame. It is used to pass frames from the collector to the state machine and logger, 
typedef struct _frame_ptr_array_t { 

struct _frame_ptr_array_t * next; 

u int32 n_ptrs; 

frame t * frm_ptrs[PTRS_PER_FPA] ; 
} frame_ptr_array_t; 
Fields: 

next used for linked listing 

n _ptrs number of live pointers in the array 

frm_ptrs[PTRS_PERJFPA] array of frame pointers 

frm_colIector_frame_alIoc 

Allocated as frame for use by the collector. The frame_size argument specifies the data payload 
size of the frame. The frame header is initialized by this routine, but the frame data is not zero 
filled. 

frame_t * frm_collector„frame_alloc( u_int!6 frame_size, 

frm_blk„t ** fb) 

Parameters: 

frame_size size of frame to be allocated 
Returns: 

framej: * allocated frame, not necessarily zero-filled, NULL if unable to allocate 

frame. 



frm_fpa_alloc 

allocates frame pointer arrays 

frame_ptr_array_t * frm_fpa_alloc() 
Returns: 

frame_ptr_array is successful, NULL if unable to allocate 
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Notes: 

Uses an internal memory pool, using the first word of each element as a chaining pointer. 
Allocates in groups of fpa_increment 



5 frmjpajxee 

frees frame pointer array 

void frm_fpa__free( frame ptr array t * fpa) 
Parameters: 

fpa frame pointer array to free 



10 frm_frame_set_interest 

Sets reference count bits indicating specified sources using the frame. Any module is allowed to 
set bits indicating that another module will be processing this frame. It is not an error to set an 
interest bit that is already set. 

void frm_frame_setjnterest( frame_t * frm, 
? u int8 interest_bits) 

Parameters: 

frm frame of interest 

I interest_bits bit mask sources of interest. 



fjj Example: 

s // logger is in replay mode, wants to make frames 

O // of interest to state machine before handing off. 

^20 frame_t * frm; 



// ... read frame from disk 
frm„frame_set_interest(frm ? FRMJBIT_SM); 

25 // queue frame to state machine 



frmjrame_clear_interest 

Clears the interest bit indicated by the model An module should only clear its own interest bit. If 
them mask drops to zero, the frame will be freed as a side effect of this routine. Clearing an 
30 already clear bit is an error. 

void frm_frame_clear_interest( frame_t * frm, 

u int8 interest_bit) 

Parameters: 

frm frame of interest 

interest_bit bit to clear 
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frm__blk„frame_clear_interest 

Clears interest bit in all the frames in the frame block 



extern void frm_blk_frame_clear Jnterest( frm_blk_t * blk, 



u int8 



interest_bit) 



Parameters: 

blk Block of interest 

interest„bit interest bit to be cleared 



5 



State Machine: 



IfeipiSpe 




sm_init 

10 configuration entry point of state machine 



extern void sm_init(config_info„t * ci) 
Parameters: 

ci configuration information 



sm_dowork 

15 event entry point of state machine system. Will yield after procesing yield_frames (or slightly 
more) or when runs out of work. 

extern void sm doworkC u int32 yield Jrames) 
Parameters: 

yieldjframes after how many frames to yield. 



20 sm_shutdown 

Called on shutdown. Use to dump summary stats, etc 

void sm_shutdown(v oid) 



sti_update_cur_time (internal) 
25 update notion of current time. 

void sti_update_cur_time() 



sti„delayed__ack_Jimeout (internal) 
update/start delayed partial ack timer 



^ i^" ..... z: , t„ , n ^ * li^-y - ^"..^1 
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void sti_rie1ayed ack timeoutf sm state ply t * ply, 

u int32 ackval) 

Parameters: 

ply connection to update 

ackval value to ack 



sti_drain_ackJimeout (internal) 
5 update/start drain ack timer 



void sti drain ack timeout( sm state ply_t * ply) 
Parameters: 

ply connection to update 



sti_set_zero_timeout (internal) 
10 set zero timer, basically means there is more ci data to send. 

void sti set zero timeoutf sm state ply„t * ply) 
Parameters: 

ply connection to update 



sti_set_cfn_delay_timeout ( internal) 
15 don't delay sending the CFN too long. 

void sti_set_cfn delay timeoutf sm state ply t * ply) 
Parameters: 

ply connection to update 



sti_delayed_ack_timeout_cancel ( internal) 
20 Cancel delayed ack timer 

void Eti_rie,1ayed_ack timeout canceK sm state ply_t * ply) 
Parameters: 

ply connection to update 



sti_update_idle_timeout (internal) 
25 update idle timer for the connection 



void sti update idle timeout( sm state t * state) 
Parameters: 

state connection to update 
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sti jrestart_idIe_timeout ( internal ) 

Restart idle timeout, or indicate connection death 

ujnt32 sti restart idle timeoutf sm state t * state) 
Parameters: 

state connection to update 

5 

Returns: 

0 connection should be terminated 

1 connection is ok, idle time reset. 

Notes: 
(internal) 

10 An idle timeout has expired. Check if state->last_packet_time to determine if this 

connection has really been idle long enough to be terminated. If connection should be 
kept alive, idle timer is reset. 



f% sti„restart_cfn_timeout (internal) 

I : J5 Restart client fin delay timeout, or indicate fin should be sent. 

u int32sti restart cfn timeout( sm state., t * state) 
n] Parameters: 



state connection to update 
Returns: 

0 connection should be terminated 

1 connection is ok, idle time reset. 



^20 sti_timer_syn„rexmit_start (internal) 
Start CSN retransmit timer 

void sti timer syn rexmit start( sm state ply t * ply) 
Parameters: 

ply state block 



sti_timer_syn_rexmit_cancel ( internal) 
25 Start CSN retransmit timer 

void sti timer syn rexmit canceK sm state ply t * ply) 
Parameters: 

ply state block 



sti_set_timeout (internal) 
Set or update absolute timer. 
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void sti_set_timeout( void * * timer, 

void * data, 

u int32 index, 

u int32 datum, 



struct timeval * time) 

Parameters: 



timer pointer to timer to be set or reset 

data opaque ptr index 

index integer index 

datum integer item stored 

time time to expire 



Notes: 
(internal) 

5 Upon return *timer will point to the timer. If *timer is non-NULL upon the call, it is a 

presumptive old timer with the same (data,index) and will be freed. 



sti_set_rel„timeout (internal) 
Update or set relative timer 



void sti„set_rel_timeout( void * * timer, 

void * data, 

u int32 index, 

u int32 datum, 



struct timeval * rel_time) 

10 Parameters: 



timer pointer to timer to be set or reset 

data opaque ptr index 

index integer index 

datum integer item stored 

rel_time time to expire (from now). 



Notes: 
(internal) 

Same as sti set timeout , except computes the absolute time of the timeout based on the 
15 current time and rel_time. 



sti_remove„timeout (internal) 

Removes the timeout specified by *timer. Error to remove a non-present timeout. Will set *timer 
to NULL on return. 

void sti_remove_timeout( void ** timer) 
20 Parameters: 

timer pointer to timer 
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sti_remove_alLtimeouts ( internal ) 
Clean up timeouts 

void sti remove all timeouts( sm state t * state) 
5 Parameters: 

state structure to clear 

Notes: 
(internal) 

Removes all timeouts with this state structure, including subtypes. 

10 , 

stLminJimeout ( internal ) 
determine waiting time 

void sti„min__timeout( struct timeval * wait_time) 
f Parameters: 

waitjime see below 



f a5 Notes: 

J'l (internal) 

pj wait_time, on input, should be set to the maximum time (relative to the last call to 

f i j sti do expired ) that we should block. 

O On return, wait_time is the time to block for. It will be no more than the input value, and 

W20 possibly shorter. 



sti_do„expired (internal) 
invoke and remove expired timers 

void sti_do_expired() 
Notes: 
25 (internal) 

Finds set of expired timers, copies them and calls back smt_expired_timer . It is safe from 
the callback to manipulate the current timer. 

epkt_t - struct containing the head of a TCP packet. Used to building packets from scratch or 
30 rewrite existing packets, 
typedef struct _epkt_t { 

struct iphdr ip; 

struct tcphdr top; 
} epkt_t; 
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Fields: 

ip ip header 

tcp tcp header 



5 smu_state_Jiash (internal) 

looks up entry in tcp hash table. Tries both the src and dst pairs as possible keys. src_is_client is 
set to TRUE if the src_ip address corresponds to the client, and FALSE otherwise. 

sm__state_t * smu state. hash( u int32 srcjp, 

u int!6 src_port, 

u int32 dst_ip, 

u int32 dst_port, 

u int32 src_class, 

u int32 dst_class, 

u int!6 seq_id) 

Parameters: 



src_jp ip address of source 

src_port tcp port of source 

dst_ip ip address of destination 

dst_port tcp port of destination 

src_class SMTJP_PG, SMTJPJG or SMTJP_CI 

dst_class SMTJPJPG, SMTJPJG or SMT_IP_CI 

seq _id DNS sequence id, or 0 for any other protocol. 



^ J 10 Returns: 

!T - hash entry, if it exists 

ill NULL entry not in table 



smu__state _hash_alloc f internal ) 

Creates a hash entry for the specified datum. 

sm_state_t * smu_state_ hash allocf u int32 src_ip, 

u int!6 src_port y 
u_int32 dst_ip, 
u int32 dst„port, 
u int32 src_class, 
u int32 dst_class, 
u_int32 conntype, 
u int!6 seq_Jd) 

15 Parameters: 

srcjp ip address of source 

src_port tcp port of source 
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dst_ip ip address of destination 

dst_port tcp port of destination 

src_class SMTJP.PG, SMT JP_IG or SMTJP_CI 

dst_class SMT JPJPG, SMT JPJG or SMTJP_CI 

conntype on of SM_C_* 

seq_id DNS sequence id, or 0 for any other protocol. 



Returns: 

hash entry, after creating it. 



smu_state_hash_free (internal) 

Releases memory and pointers to the named hash entry. All removes any times associated with 
the state or the type specific state structures. 

void smu state hash free( sm state t * lamb) 
Parameters: 

lamb hash entry to be freed. 



smu_classifyjp (internal) 

Checks an TP address against known tables of invisible and production ghosts, and returns a 
classification. 

u„int32 smu_ classify, ip( u int32 ip) 
Parameters: 

ip ip address 

Returns: 

SMT_IP_IG if address of an invisible ghost 

SMT JPJPG if address of a production ghost 

SMTJPJJN otherwise. 



smu_ vali d_tcp_packet ( internal ) 

Validates that the packet contains properly checksumed IP header and TCP header and data. As a 
side effect, fills in many of the fields. 

int smu_valid _tcp_ packet ( frame t * frm, 

u_int32 ip_start) 

Parameters: 

frm frame to verify 

ip_start start of ip data in frame 



Returns: 

0 if not a valid TCP or IP packet 
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1 if valid IP packet 

2 if valid TCP packet 

Notes: 

Assumes packet header and payload are aligned on word boundaries. 



5 smu_flush (internal) 

Flush any remaining work items before blocking. 

void smu_flush(v oid) 



smu_forward„frame ( internal) 

queue frame for emission by the emitter. The sm module is done with the frame. Before the sm 
10 blocks, it should call smjQush. This frame is known to be a valid IP frame 

void smu forward frame( frame_t * frm) 
Parameters: 

frm frame to be forwarded 



smu_send_packet (internal) 
15 queue frame for emission by the emitter. This is the fully generic version of the function which 
takes all params. 

void smu send packetf emt work t * pinfo, 

int opcode) 

Parameters: 

pinfo all of the information about the packet-to-be 

opcode EMT JPKJREWRITEJDATA or EMT_PK_SEND 

Notes: 

20 send_fin is only examined for opcode type EMT JPKJREWRITEJDATA. Note send_fin 

= 0 means a FIN should be surpressed in the header if it was already there. 



smu_cmpjrame (internal) 

SM is done with this frame; hand it off to the comparator. Whom is one of 
25 SMTJP_{IG,PG ? UN}. Before the sm blocks, it should call sm_flush. 

void smu cmp frame( frame t * frm, 

sm state t * state, 
u int32 whom) 

Parameters: 

frm frame to comparator 

whom flag indicating who sent this packet 
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smu_cmp_done (internal) 

Queue end of stream comparision indicator to comp 

void smu cmp done( sm state ply t * ply) 
Parameters: 

ply state structure 



smu_unknown_connection_frame ( internal) 
recieved a frame for whom we can't find a connection; 

void smu unknown connection frame( sm state t * state, 

frame t * frm, 
u int32 whom) 

Parameters: 

state connection 

frm frame 

whom what to with frame 



smu_q_drop_all (internal) 

Walk a link-list (linked by sm_qnext), freeing (smu_drop_frame'ing) all the frames. 

int smu q drop alK sm f t * 1) 
Parameters: 

1 sm_f_t list to free 

Returns: 

number of packets freed 



smu_q_frm (internal) 
Insert frame at tail of fifo 

void smu q frm( frame t * frm, 
sm f t * 1) 
Parameters: 

frm frame to insert 

1 Fifo 



smu_enter_timewait (internal) 

Entering timewait state; trigger comparision. 
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void smi i enter timewaitf sm state ply t * ply) 
Parameters: 

ply state block 



smu_send_reset (internal) 
5 generate a reset against the specified packet. 

void smu send resetf frame _t * firm) 
Parameters: 

frm packet which triggered the reset 



smu_send_ack (internal) 
10 generate an ack packet on the specified connection. 

void smu send ack( sm state ply t * ply, 
u int32 ack, 
u int32 win) 

Parameters: 

ply connection state structure 

ack absolute ack sequence number to send 



smu_send_fin (internal) 
15 generate an ack packet on the specified connection. 

void smu send finf sm state ply t * ply) 
Parameters: 

ply connection state structure 



smu_send_syn (internal) 
20 generate a SYN packet on the specified connection. 

void smu send synf sm state ply t * ply) 
Parameters: 

ply connection state structure 



smu_cmp_state_done (internal) 
25 Queue end of stream comparision indicator to comp 

void smu_cmp state donef sm state t * state) 
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spc„ack (internal) 

Helper function to spc_dack that does ack transmission, 
int spc ackf sm state ply t * ply, 



u int32 * ack, 
u int32 * window) 
Parameters: 

ply state structure 

ack ack to send 

window window to send 

5 Returns: 

0 if nothing needs to be done 

1 if the ack/window should be sent immediately 

2 if the ack/window sending can be delayed 



Notes: 
(internal) 

Logic: Acks are queued in the order received, and processed in the same order. Loop over 
10 the queued acks, sending all acks that are less than the last byte of data sent by the 

invisible ghost. If an ack is found to be in the ahead of the data, call smc_determine_ack 
to see if a frame boundary near the ack can be found. If a frame boundary bigger than the 
last sent ack is found, we consider it for sending. This ack is sent if (1) the suggested ack 
equals the recorded ack or (2) force„partial_ack is set. If an ack is fully consumed, we 
15 delete it. 

Drain states arrive when we expect no more acks from the client, but want to pull all 
remaining data from the invisible ghost. In the drain state, we simply generate an ack 
every time we see there is unacked IG data. 

Once we have started sending acks because of drain, we ignore any clients acks from then 
20 on. 

Force partial acks is overloaded: in drain mode, force_partial is a signal to generate an 
ack. 



spc_data (internal) 
25 Helper function to spc„dack that does data transmission. 

void spc data( sm state ply t * ply, 

u int32 * data„seq, 

char ** data, 

u int 16 * datajen, 

frame t ** frm) 
Parameters: 
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ply state structure 

data_seq sequence number of data start 

data pointer to data 

datajen length of data 

frm frame which data points into 

Notes: 
(internal) 

Logic: loop over data, sending any data currently allowed by the ig transmission window. 
5 If the window causes a packet to be fragmented, we fragment it and send it on. Once a 

packet is completely sent, we move it from the outside„window list to the sent_not„acked 
list. Both lists are maintaining in increasing order. 

One complication may arises from HTTP persistent connections. If a browser has a 
persistent connection open to a production ghost (PG), and the PG initiates the close, one 
10 will typically see the sequence: 

pg_data ci_ack (long pause) pg_fin ci_fin_ack (long pause) 

then when the browser tries to reuse the connection 

c_data p_reset. This is followed by the browser opening a new connection to the server to 
fetch whatever URL-get was reset. 

15 In order to avoid the IG from processing these URLs twice, we don't send on any client 

data received after a PFN/CFA until we see an EFN. Once the IFN recieved, we push on 
client data, which should then generate a reset. 



spc_determine_ack (internal) 
20 determine an ack value 

u_int32 spc determine ackf sm state ply t * ply, 



Parameters: 
ply 

new_ack 
examine sna 



u int32 
int 



new_ack, 
examine_sna) 



state block 
base of new ack 

boolean, wether to look at sent not acked. 



Returns: 
0 



if no ack to be generated 
0 relative ack otherwise. 



25 



Notes: 
(internal) 

If examine_sna == FALSE, just use ply->ci.acks + spontaneously acked. 
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if there is data in ply->ig.sent_not_acked, see if it is now covered. Lots of crafty segment 
alignment logic 

Caller should sweep ig.sent_not_acked and outsidejwindow. 



5 spc_release_ig_data (internal) 

spc_release_ig_data is invokved whenever the EWS sends the IG a new ack. The routine walks 
through the invisible ghost sent_not_acked list, looking for packets that have been fully acked. 

void spc release ig data( sm state ply t * ply, 

u int32 ack) 

Parameters: 

ply state block 

ack new client ack value, 0 relative 



spcjrelease_cli__data (internal) 

□ spc_release_clLdata is invoked whenever the IG sends a greater ack value. The routine walks 

3 through the cli sent_not„acked list, looking for packets that have been fully acked. Uses ply- 

^ >ig.acks as the ack value. 

^ void spc .release cli data( sm state ply t * ply) 

j;;f 15 Parameters: 

;jt ply state block 



M spc_timeout_ack (internal) 

11 The timer associated with a delayed partial ack has gone off. If we have not advanced beyond 
m20 that ack, force a partial ack transmission. 

ti void spc timeout ack( sm state ply t * ply, 

u int32 ackno) 

Parameters: 

ply state block 

ackno delayed ack number 



25 smjstate J; - primary state vehicle for TCP connections. The index into the hash table will be the 
(client ip,prt). 

typedef struct _sm_state_t { 
struct __sm_state_t * next; 
struct _sm_state J: * lru_next; 
30 struct _sm_statej: * lru_prev; 
u int32 ci_ip; 
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u int32 pg_ip; 
u int32 ig_ip; 
u int!6 ci_port; 
u mtl6 pg_port; 
u int!6 ig_port; 
u int!6 conntype; 
u int32 hash_index; 
struct timeval last_frame_time; 
void * timer_idle; 
void * type_specific; 
u int!6 ipproto; 
u int!6 rec_pkts; 
} sm_state_t; 
Fields: 



next linked list construction 

lru_next linked list for old connection removal 

lrujprev linked list for old connection removal 

ci_ip client ip address 

pgjp production ghost ip address 

ig_ip invisible ghost address 

cLport client TCP port 

pg_port production ghost TCP port 

ig_port invisible ghost port 

conntype which of SM__C_* 

hash_index index into the hash table for sm 

last Jrame_time timestamp at which last frame arrived 

timerjdle pointer to idle timer 

type„specific info specific to conntype 

ipproto packet protocol (tcp/udp/ip) 

rec_pkts origin of received packets on this state (ci | ig | pg) 



sm_f_t - Helper structure used to maintain FIFO connections. Uses frame->hdr.sm_qnext for 
linked lists. 

typedef struct _sm_f_t { 

frame t * head; 

frame t * tail; 
} sm_f_t; 

Fields: 

head head of linked list 

tail tail of linked list 



sm_h_t - Helper structure used inside sm_state_ply__t 
typedef struct _sm_h_t { 
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u int32 seqbase; 

u int32 sent; 

u int32 acks; 

u int32 win; 
5 sm f t sent_not_acked; 

sm f t outside_window; 

u int32 fin_sno; 

u int!6 ip_id; 

u int!6 options; 
10 u int!6 mss; 

u int!6 win_scale; 
} sm_h_t; 

Fields: 



seqbase initial sequence number 

sent 0-relative highest data sequence number sent 

acks 0-relative highest ack sent 

win current window 

sent_not_acked link-listed of packets sent, but not acked 

outside_window data packets outside the send window 

fin_sno fin sequence number (not zero-relative) 

ip_id id field of last ip packet seen; used to detect out of order packets 

options options sent with SYN 

mss advertised mss 

win_scale window scale in this direction (currently unused) 



n 15 Notes: 
LI (internal) 

U One is maintained for each of the client, production ghost and invisible ghost. 



sm„state_ply_t - state holder for play'd (split descriptions) 
20 typedef struct _sm_state_ply„t { 

u,_int32 m_state; 

u int32 syn_retries:27; 

u jnt32 cmp_done: 1 ; 

u int32 reuse_protect: 1 ; 
25 u int32 started_draining: 1 ; 

u int32 timewait: 1 ; 

u int32 spontaneously_acked; 

u int32 ci Jastack; 

u int32 ci_ack; 
30 u int32 ci_win; 

void * timer„dack„xmit; 

sm state t * sm_state; 

sm, h _t ci; 
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sm_h_t pg; 
sm_h_t ig; 
sm f t ci_acks; 
} sm_state_ply_t; 
5 Fields: 

m_state 

syn_jetries:27 

cmp_done: 1 

reuse_protect:l 

started_draining:l 

timewait: 1 

spontaneously_acked 
cijastack 
ci_ack 
ci_win 

timer_dack_xmit 
sm_state 
H ci 

3 pg 
m ig 

M ci_acks 

W Notes: 

S (internal) 

' reuse_protect is set when a connection is draining and new a syn from the same client 

i ^10 (ip/port) arrives. Reuse_protect causes all packets from the pg and client to be thrown 

[ j away, giving the ig a chance to finish the first connection. 

m m_state - state bits from internal open state machine or'ed with state bits from close state 

O machine « 5; 

The index into the hash table will be the (client ip,prt). 



internal TCP state, 
syn rexmit counter 

flag: has smu_cmp_done been invoked? 

flag: SYN arrived on live connection 

flag: has a drain mode ack been sent 

flag: wait a bit before removing connection 

number of bytes spontaneously acked 

last ack sent to ig 

last ack received from ci 

last window received from ci 

timer for D ACK rexmit 

backpoint to parent 

client state 

production ghost state 

invisible ghost state 

FIFO of client acks ahead of data 



15 



20 



smt_process Jog ( internal ) 

Processes a packet in the tcp subsystem. If processed, the frame may have been freed. Assumes 
caller has determined that this a valid TCP/IP frame. 

void smt process log( frame t * frm, 
sm state t * state) 

Parameters: 

frm frame to be processed 

state structure associated with the connection 

Returns: 
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0 if processed frame 

1 if did not process frame 



smt_process_fwd ( internal ) 

Processes a packet in the tcp subsystem. If processed, the frame may have been freed. Assumes 
5 caller has determined that this a valid TCP/IP frame. 

void smt process fwd( frame t * frm, 

sm state t * state) 

Parameters: 



frm frame to be processed 

state structure associated with the connection 

Returns: 

0 if processed frame 

1 if did not process frame 



%0 _ — _ 

smt„ply_h_nullx (internal) 

7 f Processes a packet in the tcp subsystem. This is the workhorse routine for the state machine. 

^ Preferably, it is split up into 3 sections, depending on where the packet originated from. The 

^ entire state machine can be implemented in one function by noting that the states are path 

mi5 invariant, i.e., it does not matter how the state was arrived at, only that it is in a given state. 

nj Because of this, behavior can be determined based on specific packets by doing simple checks to 

3 make sure appropriate packets have been seen earlier. In addition to managing the state 

Q according to the state machine, all the data flow/aknowledgement logic is handled either in this 

^ routine or by helper functions. Often, the acks generated by the client will not line up with the 

[^20 data packets sent by the invisible ghost. 

!*? void smt ply h, nullx( sm state t * state, 
r "" frame _t * frm) 

Parameters: 



state structure associated with the connection 

frm frame to be processed 

Returns: 

0 if processed frame 

1 if did not process frame 



smt_process_ply (internal) 

Processes a packet in a split stream. Assumes caller has determined that this a valid TCP/IP 
frame. 

void smt_process_ply( frame _t * frm, 
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sm state t * state) 

Parameters: 



frm frame to be processed 

state strucutre associated with the connection 

Returns: 

0 if processed frame 

1 if did not process frame 



smtcp„doframe (internal) 

Determines if a packet is part of an existing connection. If not, do we need to create a connection 
for it, and if so, what type of connection? If processed, the frame may have been freed. Assumes 
caller has determined that this a valid TCP/IP frame. 

u_int32 smtcp _doframe( frame , t * frm, 

u int32 ip_start) 

: J0 Parameters: 



frm frame to be processed 

ip_start byte offset of the start of the TCP header 

Returns: 

0 if processed frame 

1 if did not process frame 



Oi5 smt_ply_free (internal) 

yj Closing down a smt„ply structure. Let the comparator know its time to compare these streams. 

=; s 5 static void smt_ply free( sm , state ply _t * ply) 
ir; Parameters: 

r " ply state block to be freed 



20 smtjdle_timeout (internal) 
handle idle timer expiration 

void smt idle timeout( sm state t * state) 
Parameters: 

state control block of timeout 

Notes: 
25 (internal) 

Idle timeout has gone off for this connection. The idle timeout of a connection is updated 
lazily, so this does not mean the connection has been necessarily idle for this long. Call 
stijdle_restart to restart the timer (if not really expired) or otherwise really expire the 
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Logger/Replay : 

5 The logger module is intended to provide the following pieces of functionality. 

First, to log all frames as they are gathered by the collector. (Optionally, one might want the 
logger to be selective - for example, to only log the client generated packets.) 

10 Second, to be able to play back a saved packet trace, simulating the collector. Additional desired 
functionality would be to compare the world simulator output from a saved run to a new run to 
check for regression. 

n lggJnit 

"js configuration entry point of logging subsystem 

LI extern void lgg_init( config_jnfo_t * ci, 

H int nowrite) 

y;l Parameters: 

O ci configuration information 

HI nowrite if set, force the logger to not log 



lgg_shutdown 

Shut down logger/write last disk block 
il void lgg„shutdown( v oid) 



lgg_dowork 

event entry point of logging subsystem, 
extern void lgg_dowork() 



25 lgg_replay 

Entry point for log replay 

int lgg_replay( int tcpdumpmode) 
Parameters: 

tcpdumpmode (boolean) if set, just dump instead of replaying 
Returns: 

0 if the replay completed successfully. 
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Notes: 

The specified log file will be opened in turn and played back. Play back means to send 
each logged packet back through the logger interface and into the state machine 

5 , , _ ^ 

debug_print_frame 

tcpdump style description of the packet in frm. 

void debug_print__frame( frame_t * frm, 

FILE * filedes) 

Parameters: 

frm frame to be printed. 

filedes stream in which to write the information. 

10 Notes: 

This routine is primarily for debugging. 



Comparator : 

h& CWT_PER_CWA - number of cmp_work_t pointers in a cmp_work_array_t 
&5 #define CWTJ>ER_CWA 10 

B _ cmpjwork_t - If frame is non-NULL, then this a frame for the comparator to analyze, and type 
H indicates the source of the frame: SMTJOP„{PG,IG,UN} for invisible ghost, production ghost 
J 8 ': and unknown (presumptive client) respectively. If frame is NULL, this packet indicates a set of 
»%20 flows that ready to compare. Included is a triple of ci, production, and invisible ghost ip and 
f i ports, respectively. The ports are in host order, while the ip addresses are in network order. 
U typedef struct { 

frame t * frame; 

u_int32 conn_id; 
25 u int32 ci_ip; 

u_int32 pg_ip; 

u,_int32 ig_ip; 

u int!6 ci_port; 

u_int!6 pg__port; 
30 u_int!6 ig_port; 



} cmp__work_t; 
Fields: 

frame TCP frame 

conn_id Connection id 

ci_jp clientip (network order) 

PS-ip production ghost ip (network order) 

ig_ip invisible ghost ip (network order) 
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cLport client port (host order) 

pg_port production ghost port (host order) 

igjport invisible ghost port (host order) 



cmp„work_array J; - Batched vector of work items for the comparator to process, 
typedef struct _cmp_w ork_array__t { 
5 struct _cmp_work_array_t * next; 
u int32 n_elt; 

cmp^work_t work„elt[CWT J>ER_CWA]; 
} cmp_work„array_t; 
Fields: 

next next work array in the list 

n__elt number of live work items 

work_elt[CWT_PER_CWA] array of work items 



H Emitter : 

n pseudo_hdrJ - The pseudo header for UDP/TCP checksumming as defined by the TCP/IP spec. 

y;|l5 typedef struct _pseudo_hdr_t { 

uj u int32 saddr; 

r| j u int32 daddr; 

s u int8 zero; 

H u int8 proto; 

?flO u int!6 len; 

C } pseudo_hdr_t; 

~* Fields: 



saddr source IP address, 

daddr dest IP address, 

zero pad byte, 

proto protocol number. 

len UDP/TCP packet length including header. 




25 emt_work_t - Contains a single unit of work for the emitter thread, 
typedef struct _emt_work„t { 

char * data; 

char options [40]; 

frame t * frm„ptr; 
30 int opcode; 

u jnt32 saddr; 

u_int32 daddr; 

u int32 seq; 
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u int32 ack; 
u int32 frm_win; 
u int!6 sport; 
u_int!6 dport; 
5 u int!6 datajen; 



ujnt!6 opt Jen; 
u int8 flags; 
} emt_work_t; 
Fields: 

data TCP payload pointer. 

options[40] TCP options. 

frm_ptr Frame to build new packet off of. 

opcode Specifies some work to be done on the frame. 

saddr Source address. 

daddr Destination address. 

seq sequence number. 

ack ack sequence number. 

frm_win TCP window value 

sport Source port. 

dport Destination port. 

datajen Length of data. 

optjen Length of options. 

flags TCP flags. 

Notes: 



All values which are also contained in network packets are assumed to be in network 
order. 



0 15 emt„work_arrayJ - Convenience type for passing around batches of emt_workJ's. 
M typedef struct _emt„work_array J { 

struct __emt_work_arrayJ * next; 

int n_elt; 

emt work t work_elt[CWT„PER_EWA]; 
20 } emt_w or k_arr ay J ; 
Fields: 

next linked list overhead. 

n_elt Number of emt_work J's contained herein. 

work_elt[CWTJPERJBWA] Array of data to be worked on. 



25 emtjnit 

Handles initialization for the emitter module. 
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void emt_init( config_jnfo_t * ci) 
Parameters: 

ci information read from the config file. 

Returns: 

-1 on error 

0 otherwise 

5 , ■ _____ 

emt_shutdown 

Handles shutdown for the emitter module, 
void emt_shutdown() 

10 emt_dowork 

Does work for a little while, then yields. 

void emt_dowork() 

O Although the present invention has been described and illustrated in the context of testing 

Lip5 a CDN content staging server, this is not a limitation of the present invention. One of ordinary 
*j* skill in the art will recognize that systems infrastructure underlying the present invention is 
H suitable for testing a variety of network-based systems including web servers, proxy servers, 
M, DNS name servers, web server plugins, browsers, and the like. Thus, another illustrative 
7^ production environment is a web hosting environment with the system under test being any 
H 20 generic web server. Moreover, by adapting the test logic used to determine "equivalent output" 

between a production system and the SUT, real-world workloads can be used to test and validate 

new functionalities, regardless of the specific nature of the SUT. 

Having thus described our invention, the following sets forth what we now claim. 
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